FMEM: A Fine-grained Memory Estimator for MapReduce Jobs

نویسندگان

  • Lijie Xu
  • Jie Liu
  • Jun Wei
چکیده

MapReduce is designed as a simple and scalable framework for big data processing. Due to the lack of resource usage models, its implementation Hadoop hands over resource planning and optimizing works to users. But users also find difficulty in specifying right resource-related, especially memory-related, configurations without good knowledge of job’s memory usage. Modeling memory usage is challenging because there are many influencing factors such as framework’s dataflow, user-defined programs, large space of configurations and memory management mechanism of JVM. In order to help both users and the framework to analyze, predict and optimize memory usage, we propose a Fine-grained Memory Estimator for MapReduce jobs called FMEM. FMEM contains a dataflow estimator which can predict the data volume flowing among map/reduce tasks. Based on dataflow and rules of memory utilization learnt from a lot of jobs, FMEM uses a rules-statistics method to estimate fine-grained memory usage in each generation of task’s JVM. Representative benchmarks show that FMEM can predict diverse jobs’ memory usage within 20% relative error. Furthermore, FMEM will be promoted to find optimum dataflow and memory related configurations.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

MROrchestrator: A Fine-Grained Resource Orchestration Framework for Hadoop MapReduce

Efficient resource management in data centers and clouds running large distributed data processing frameworks like Hadoop is crucial for enhancing the performance of hosted MapReduce applications, and boosting the resource utilization. However, existing resource scheduling schemes in Hadoop allocate resources at the granularity of fixed-size, static portions of the nodes, called slots. A slot r...

متن کامل

Bitonic-MapReduce: Optimization of MapReduce on the Cell B.E. Architecture with a Bitonic Sort Senior Honors Thesis

The Cell B.E. Architecture is a novel, heterogeneous, multi-core architecture that offers opportunities for significant performance. However, a lack of programmer familiarity with explicitly parallelizing code and difficulty using its unique software-managed memory model make writing programs for the Cell difficult, even for experienced programmers. However, if tools can be made to abstract awa...

متن کامل

Hadoop Performance Models

Hadoop MapReduce is now a popular choice for performing large-scale data analytics. This technical report describes a detailed set of mathematical performance models for describing the execution of a MapReduce job on Hadoop. The models describe dataflow and cost information at the fine granularity of phases within the map and reduce tasks of a job execution. The models can be used to estimate t...

متن کامل

Analyzing and Accelerating Runtime Systems on Multicore Architecture

TIWARI, DEVESH. Analyzing and Accelerating Runtime Systems on Multicore Architecture. (Under the direction of Yan Solihin.) Technology scaling has made multicore architectures commercially prevalent. However, exploiting multicore parallelism for performance remains challenging for programmers, because of side-effects of parallel programming such as concurrency management, data-races, deadlocks ...

متن کامل

Large-scale seismic signal analysis with Hadoop

In seismology, waveform cross correlation has been used for years to produce high-precision hypocenter locations and for sensitive detectors. Because correlated seismograms generally are found only at small hypocenter separation distances, correlation detectors have historically been reserved for spotlight purposes. However, many regions have been found to produce large numbers of correlated se...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2013